[FEAT] Enable buffered iteration on plans #2566

jaychia · 2024-07-26T22:12:56Z

Helps close part of #2561

This PR enables buffering of result partition tasks, preventing "runaway execution" of executions when run concurrently.

The problem previously was that if we ran two executions in parallel (e1 and e2) on a machine with 8 CPUs:

e1 could potentially run 8 tasks and keep them buffered (not releasing the resource request)
When e2 attempts to run the next task, it notices that the task cannot be admitted on the system (due to memory constraints)
- e2 thinks that it is deadlocking because there is a strong assumption in the pyrunner today that if a task cannot be admitted, it merely has to wait for some other tasks in the same execution to finish up.
- However, e2 doesn't have any tasks currently pending (because it is starved). The pending tasks are all buffered in e1. Thus it thinks that it is deadlocking.

Solution

This PR sets the default buffering behavior to 1 instead of allowing each execution to run as many tasks as it wants
We introduce logic in the physical plan to have an upper limit on the size of the materialization buffer. If that buffer gets too large, it will start yielding None to indicate that the plan is unable to proceed.

Note that there is still potentially a problem here, e.g. running > NUM_CPU number of executions concurrently. That can be solved in a follow-up PR for refactoring the way we do resource accounting.

codecov · 2024-07-29T23:17:34Z

Codecov Report

Attention: Patch coverage is 91.35802% with 7 lines in your changes missing coverage. Please review.

Please upload report for BASE (main@4701290). Learn more about missing BASE report.
Report is 3 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2566   +/-   ##
=======================================
  Coverage        ?   64.02%           
=======================================
  Files           ?      951           
  Lines           ?   107920           
  Branches        ?        0           
=======================================
  Hits            ?    69101           
  Misses          ?    38819           
  Partials        ?        0

Files	Coverage Δ
daft/plan_scheduler/physical_plan_scheduler.py	`90.90% <100.00%> (ø)`
daft/runners/ray_runner.py	`90.26% <100.00%> (ø)`
daft/dataframe/dataframe.py	`88.29% <80.00%> (ø)`
daft/runners/pyrunner.py	`93.57% <83.33%> (ø)`
daft/execution/physical_plan.py	`94.59% <91.93%> (ø)`

desmondcheongzx

Looks good! Some minor nits that would be good to fix if the docstrings are public facing

daft/dataframe/dataframe.py

Co-authored-by: Desmond Cheong <[email protected]>

Together with #2566 , closes #2561 This PR changes the way the PyRunner performs resource accounting. Instead of updating the number of CPUs, GPUs and memory used only when futures are retrieved, we do this just before each task completes. These variables are protected with a lock to allow for concurrent access from across worker threads. Additionally, this PR now tracks the inflight `Futures` across all executions globally in the PyRunner singleton. This is because there will be instances where a single execution might not be able to make forward progress (e.g. there are only 8 CPUs available, and there are 8 other currently-executing partitions). In this case, we need to wait for **some** execution globally to complete before attempting to make forward progress on the current execution. --------- Co-authored-by: Jay Chia <[email protected]@users.noreply.github.com>

[FEAT] Enable buffered iteration on plans

7730e7d

github-actions bot added the enhancement New feature or request label Jul 26, 2024

Add code to replenish buffer

8159ef4

jaychia mentioned this pull request Jul 27, 2024

[FEAT] Fix resource accounting in PyRunner #2567

Merged

jaychia requested review from samster25 and desmondcheongzx July 27, 2024 00:19

Jay Chia added 2 commits July 29, 2024 15:43

Add unit tests

3ca269e

Add test comments

1ed11b8

desmondcheongzx approved these changes Jul 30, 2024

View reviewed changes

daft/dataframe/dataframe.py Outdated Show resolved Hide resolved

daft/dataframe/dataframe.py Outdated Show resolved Hide resolved

jaychia and others added 2 commits July 29, 2024 20:06

Update daft/dataframe/dataframe.py

0595c56

Co-authored-by: Desmond Cheong <[email protected]>

Update daft/dataframe/dataframe.py

0783b6e

Co-authored-by: Desmond Cheong <[email protected]>

jaychia enabled auto-merge (squash) July 30, 2024 03:08

jaychia merged commit 4fec71c into main Jul 30, 2024
44 checks passed

jaychia deleted the jay/fix-concurrent-iters branch July 30, 2024 03:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEAT] Enable buffered iteration on plans #2566

[FEAT] Enable buffered iteration on plans #2566

jaychia commented Jul 26, 2024 •

edited

Loading

codecov bot commented Jul 29, 2024

desmondcheongzx left a comment

[FEAT] Enable buffered iteration on plans #2566

[FEAT] Enable buffered iteration on plans #2566

Conversation

jaychia commented Jul 26, 2024 • edited Loading

Solution

codecov bot commented Jul 29, 2024

Codecov Report

desmondcheongzx left a comment

Choose a reason for hiding this comment

jaychia commented Jul 26, 2024 •

edited

Loading